kdANN+: A Rapid AkNN Classifier for Big Data
نویسندگان
چکیده
A k-nearest neighbor (kNN) query determines the k nearest points, using distance metrics, from a given location. An all k-nearest neighbor (AkNN) query constitutes a variation of a kNN query and retrieves the k nearest points for each point inside a database. Their main usage resonates in spatial databases and they consist the backbone of many location-based applications and not only. In this work, we propose a novel method for classifying multidimensional data using an AkNN algorithm in the MapReduce framework. Our approach exploits space decomposition techniques for processing the classification procedure in a parallel and distributed manner. To our knowledge, we are the first to study the kNN classification of multidimensional objects under this perspective. Through an extensive experimental evaluation we prove that our solution is efficient, robust and scalable in processing the given queries.
منابع مشابه
Adaptive K-Nearest Neighbor Classifier Based on Features Extracted by Nonparametric Model
In general there are two main approaches for overcoming the highdimensional and small sample size (SSS) problem. One is to apply feature extraction or selection to reduce the dimensionality, and then applying the reduced-dimensionality data set to classifier. The other is to modify the classifier design to be suitable for SSS problem. This study integrates the two approaches into a new K-neares...
متن کاملRapid AkNN Query Processing for Fast Classification of Multidimensional Data in the Cloud
A k-nearest neighbor (kNN) query determines the k nearest points, using distance metrics, from a specific location. An all k-nearest neighbor (AkNN) query constitutes a variation of a kNN query and retrieves the k nearest points for each point inside a database. Their main usage resonates in spatial databases and they consist the backbone of many location-based applications and not only (i.e. k...
متن کاملDistributed In - Memory Processing of All k Nearest Neighbor Queries ( Extended
A wide spectrum of Internet-scale mobile applications, ranging from social networking, gaming and entertainment to emergency response and crisis management, all require efficient and scalable All k Nearest Neighbor (AkNN) computations over millions of moving objects every few seconds to be operational. In this paper we present Spitfire, a distributed algorithm that provides a scalable and high-...
متن کاملVerification of unemployment benefits’ claims using Classifier Combination method
Unemployment insurance is one of the most popular insurance types in the modern world. The Social Security Organization is responsible for checking the unemployment benefits of individuals supported by unemployment insurance. Hand-crafted evaluation of unemployment claims requires a big deal of time and money. Data mining and machine learning as two efficient tools for data analysis can assist ...
متن کاملCloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming
The objective of this study is to verify the importance of the capabilities of cloud computing services in managing and analyzing big data in business organizations because the rapid development in the use of information technology in general and network technology in particular, has led to the trend of many organizations to make their applications available for use via electronic platforms hos...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Trans. Large-Scale Data- and Knowledge-Centered Systems
دوره 24 شماره
صفحات -
تاریخ انتشار 2016